A Fast Fertility Hidden Markov Model for Word Alignment Using MCMC

نویسندگان

  • Shaojun Zhao
  • Daniel Gildea
چکیده

A word in one language can be translated to zero, one, or several words in other languages. Using word fertility features has been shown to be useful in building word alignment models for statistical machine translation. We built a fertility hidden Markov model by adding fertility to the hidden Markov model. This model not only achieves lower alignment error rate than the hidden Markov model, but also runs faster. It is similar in some ways to IBM Model 4, but is much easier to understand. We use Gibbs sampling for parameter estimation, which is more principled than the neighborhood method used in IBM Model 4.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exact Maximum Inference for the Fertility Hidden Markov Model

The notion of fertility in word alignment (the number of words emitted by a single state) is useful but difficult to model. Initial attempts at modeling fertility used heuristic search methods. Recent approaches instead use more principled approximate inference techniques such as Gibbs sampling for parameter estimation. Yet in practice we also need the single best alignment, which is difficult ...

متن کامل

A generalization of Profile Hidden Markov Model (PHMM) using one-by-one dependency between sequences

The Profile Hidden Markov Model (PHMM) can be poor at capturing dependency between observations because of the statistical assumptions it makes. To overcome this limitation, the dependency between residues in a multiple sequence alignment (MSA) which is the representative of a PHMM can be combined with the PHMM. Based on the fact that sequences appearing in the final MSA are written based on th...

متن کامل

Type-based MCMC for Sampling Tree Fragments from Forests

This paper applies type-based Markov Chain Monte Carlo (MCMC) algorithms to the problem of learning Synchronous Context-Free Grammar (SCFG) rules from a forest that represents all possible rules consistent with a fixed word alignment. While type-based MCMC has been shown to be effective in a number of NLP applications, our setting, where the tree structure of the sentence is itself a hidden var...

متن کامل

Extensions to HMM-based Statistical Word Alignment Models

This paper describes improved HMM-based word level alignment models for statistical machine translation. We present a method for using part of speech tag information to improve alignment accuracy, and an approach to modeling fertility and correspondence to the empty word in an HMM alignment model. We present accuracy results from evaluating Viterbi alignments against human-judged alignments on ...

متن کامل

Extentions to HMM-based Statistical Word Alignment Models

This paper describes improved HMM-based word level alignment models for statistical machine translation. We present a method for using part of speech tag information to improve alignment accuracy, and an approach to modeling fertility and correspondence to the empty word in an HMM alignment model. We present accuracy results from evaluating Viterbi alignments against human-judged alignments on ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010